Phylogenetic Diversity within Seconds
نویسندگان
چکیده
—We consider a (phylogenetic) tree with n labeled leaves, the taxa, and a length for each branch in the tree. For any subset of k taxa, the phylogenetic diversity is defined as the sum of the branch-lengths of the minimal subtree connecting the taxa in the subset. We introduce two time-efficient algorithms (greedy and pruning) to compute a subset of size k with maximal phylogenetic diversity in O(«log/c) and O[n + (n k) login k)] time, respectively. The greedy algorithm is an efficient implementation of the so-called greedy strategy (Steel, 2005; Pardi and Goldman, 2005), whereas the pruning algorithm provides an alternative description of the same problem. Both algorithms compute within seconds a subtree with maximal phylogenetic diversity for trees with 100,000 taxa or more. [Biodiversity conservation; Comparative genomics; Greedy algorithm; Phylogenetic diversity; Phylogenetic tree; Pruning algorithm.] Recently, Steel (2005) and Pardi and Goldman (2005) have shown that being greedy works if one is interested in selecting k taxa from a phylogenetic tree that maximize the phylogenetic diversity. The term phylogenetic diversity (PD) was coined by Faith (1992) to provide an effective measure of the diversity of a group of taxa. The optimal PD describes the amount of diversity embraced by a properly chosen subset of taxa. Faith (1992) applied PD to place conservation priorities on different taxa, where the taxa to protect reflect a certain value of taxonomic diversity. Thus, some measurable indicator of biodiversity defined on different scales (taxa, group of taxa, ecosystems, etc.) is assigned to the corresponding systematic categories. With the advent of molecular genetics, evolutionary divergence on the genomic level may also serve this purpose (Pardi and Goldman, 2005). For the following, the precise nature of the measure of phylogenetic diversity is not relevant (cf. Humphries et al., 1995; Williams and Araujo, 2002, for a discussion on diversity measures). Phylogenetic diversity should simply describe the overall value of a group of taxa either in terms of genetic diversity, regional diversity, or social diversity. Moreover, it is required that these measures can be mapped onto a phylogenetic tree in a way that the branches of the tree receive non-negative weights. The problem is then as follows: From a tree with n taxa, one wants to identify k taxa that retain the maximal phylogenetic diversity, therefore taking into account the fact that due to restricted resources only a certain percentage of the taxa can be sustained. Steel (2005) and Pardi and Goldman (2005) have proven that a greedy approach yields the optimal set with respect to PD. The greedy strategy repeatedly selects the taxon that adds the most divergence to the already chosen set of taxa. The procedure is repeated until k taxa are found. Both proofs apply—directly or indirectly—the theory of weighted matroids and greedy algorithms (Korte et al., 1991). From this theory it follows that an algorithm with time complexity O(n log n) is possible. In the following, we will suggest a time-efficient greedy phylogenetic diversity algorithm (gPDA). Moreover, a different but easier to implement algorithm, the pruning phylogenetic diversity algorithm (pPDA) will be introduced. Both algorithms compute the optimal k set for large phylogenies within seconds.
منابع مشابه
Phylogenetic diversity within seconds.
We consider a (phylogenetic) tree with n labeled leaves, the taxa, and a length for each branch in the tree. For any subset of k taxa, the phylogenetic diversity is defined as the sum of the branch-lengths of the minimal subtree connecting the taxa in the subset. We introduce two time-efficient algorithms (greedy and pruning) to compute a subset of size k with maximal phylogenetic diversity in ...
متن کاملMitochondrial Diversity and Phylogenetic Structure of Marghoz Goat Population
The genetic diversity and phylogenetic structure was analyzed in Marghoz goat population by mitochondrial DNA sequences. Phylogenetic analysis was carried out using hyper variable region 1 (968 bp) obtained form 40 animals. Marghoz goat proved to be extremely diverse (average haplotype diversity of 0.999) and the nucleotide diversity values 0.022. A total of 40 Marghoz goats were grouped into s...
متن کاملGenetic diversity of Arum L. based on plastid marker
TrnL-F region including intron trnL (UAA) and trnL (UAA) - trn (GAA) spacer in the large single-copy region of the chloroplast genome is widely used to infer phylogenetic relationships in plants. In this study, we obtained the trnL-F sequences from 8 samples of Arum L. in Iran. Phylogenetic analyses were conducted by the Bayesian inference, maximum parsimony, and maximum likelihood methods. The...
متن کاملPhylogeny and genetic diversity of Fusarium graminearum species complex associated with Fusarium head blight of wheat in Moghan plain (Iran)
Thirty-seven isolates of Fusarium graminearum species complexobtained from wheat heads with Fusarium head blight symptoms were selected and used for phylogenetic studies. They were collected from different localities of Moghan plain (Ardebil province, Iran). Partial sequences of translation elongation factor 1-alpha (TEF), putative reductase (RED) and UTP-ammonia ligase (URA) genes were amplifi...
متن کاملThe Major Sources of Genetic Differentiation Among Apricot Latent Virus (ApLV) Isolates
Background and Aims: Apricot latent virus (ApLV) is a species within Foveavirus genus (Betaflexiviridae family, Tymovirales order). Phylogenetic analyses using different ORFs nucleotide sequences divided most ApLV isolates into two clusters. However, there is little data about the sources of genetic differentiation among ApLV isolates. Materials and Methods: Partial coat protein (CP) sequences...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006